Cluster-specific Named Entity Transliteration

نویسنده

  • Fei Huang
چکیده

Existing named entity (NE) transliteration approaches often exploit a general model to transliterate NEs, regardless of their origins. As a result, both a Chinese name and a French name (assuming it is already translated into Chinese) will be translated into English using the same model, which often leads to unsatisfactory performance. In this paper we propose a cluster-specific NE transliteration framework. We group name origins into a smaller number of clusters, then train transliteration and language models for each cluster under a statistical machine translation framework. Given a source NE, we first select appropriate models by classifying it into the most likely cluster, then we transliterate this NE with the corresponding models. We also propose a phrasebased name transliteration model, which effectively combines context information for transliteration. Our experiments showed substantial improvement on the transliteration accuracy over a state-of-the-art baseline system, significantly reducing the transliteration character error rate from 50.29% to 12.84%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustered-Specific Named Entity Transliteration

Existing named entity (NE) transliteration approaches often exploit a general model to transliterate NEs, regardless of their origins. As a result, both a Chinese name and a French name (assuming it is already translated into Chinese) will be translated into English using the same model, which often leads to unsatisfactory performance. In this paper we propose a cluster-specific NE transliterat...

متن کامل

Named Entity Recognition and Transliteration for Telugu Language

The concept of transliteration is a wonderful art in Machine Translation. The translation of named entities is said to be transliteration. Transliteration should not be confused with translation, which involves a change in language while preserving meaning. Transliteration performs a mapping from one alphabet into another. In a broader sense, the word transliteration is used to include both tra...

متن کامل

A Hybrid Approach of English- Hindi Named-entity Transliteration

In recent years, machine transliteration has gained a center of attention for research. Both machine translation and transliteration are important for e-governance and web based online multilingual applications. As machine translation translate source language to target language which results in wrong translation for named entities. Named entities are required to be translated with preserving t...

متن کامل

Named Entity Translation with Web Mining and Transliteration

This paper presents a novel approach to improve the named entity translation by combining a transliteration approach with web mining, using web information as a source to complement transliteration, and using transliteration information to guide and enhance web mining. A Maximum Entropy model is employed to rank translation candidates by combining pronunciation similarity and bilingual contextu...

متن کامل

Some Experiments in Mining Named Entity Transliteration Pairs from Comparable Corpora

Parallel Named Entity pairs are important resources in several NLP tasks, such as, CLIR and MT systems. Further, such pairs may also be used for training transliteration systems, if they are transliterations of each other. In this paper, we profile the performance of a mining methodology in mining parallel named entity transliteration pairs in English and an Indian language, Tamil, leveraging l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005